Quantitative DMS mapping for automated RNA secondary structure inference
نویسندگان
چکیده
For decades, dimethyl sulfate (DMS) mapping has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using a pseudo-energy framework developed for 2 ́-OH acylation (SHAPE) mapping. On six non-coding RNAs with crystallographic models, DMSguided modeling achieves overall false negative and false discovery rates of 9.5% and 11.6%, comparable or better than SHAPE-guided modeling; and non-parametric bootstrapping provides straightforward confidence estimates. Integrating DMS/SHAPE data and including CMCT reactivities give small additional improvements. These results establish DMS mapping – an already routine technique – as a quantitative tool for unbiased RNA structure modeling. Understanding the many biological functions of RNAs, from genetic regulation to catalysis, requires accurate portraits of the RNAs’ folds. Among biochemical tools available for interrogating RNA structure, chemical mapping or “footprinting” uniquely permits rapid characterization of any RNA or ribonucleoprotein system in solution at singlenucleotide resolution [see, e.g. ref. (1, 2)]. Chemical mapping is being advanced by several groups through new approaches for chemical modification, coupling to high-throughput readouts, rapid data processing, high-throughput mutagenesis, and incorporation into structure prediction algorithms (3–7). Perhaps the most widely used RNA chemical probe is dimethyl sulfate (DMS) (8–11). DMS modification of the Watson-Crick edge of adenosines or cytosines (at N1 or N3, respectively) blocks reverse transcription, so that reactivities can be obtained by primer extension at single-nucleotide resolution. Nucleotides that appear most strongly protected or reactive to DMS can be inferred to be base-paired or unpaired – qualitative or ‘binary’ information that can be used for RNA structure modeling by manual or automatic methods (10, 12). More recently developed methods, such as selective 2' hydroxyl acylation with primer extension (SHAPE) (6), give reactivities that correlate with Watson-Crick base pairing for all nucleotide types, providing more data points than DMS. Indeed, when incorporated into free energy minimization algorithms as pseudo-energy bonuses, SHAPE data can recover RNA secondary structures with high accuracy (11); and nonparametric bootstrapping can identify regions with poor confidence (13). Nevertheless, this pseudo-energy framework has not been leveraged for prior chemical approaches such as DMS mapping, despite the wide use of these data for in both in vitro, in vivo, and in virio contexts (9, 12, 14, 15). We present herein a benchmark of pseudo-energy-guided secondary structure modeling based on DMS data for 6 noncoding RNAs: unmodified E. coli tRNA (16), the P4-P6 domain of the Tetrahymena group I ribozyme (17), E. coli 5S rRNA (12), and three ligand-bound domains from bacterial riboswitches [the V. vulnificus add adenine riboswitch (18), V. cholerae cyclic di-GMP riboswitch (19), and F. nucleatum glycine riboswitch (20)]. In all cases, crystallographic data confirmed by solution analyses with the two-dimensional mutate-andmap approach (21), have provided ‘gold-standard’ secondary structures (Supporting Table S1) for evaluating the method’s accuracy. The challenging nature of this benchmark was confirmed by the poor accuracy of the RNAstructure algorithm without data (Table 1). These models miss 38% of true helices (false negative rate, FNR), and 45% of the returned helices are incorrect (false discovery rate, FDR). We measured DMS reactivities and estimated errors, inferred from three to eight replicates for each of the six RNAs (Supporting Figures S4 to S9 & Table S1). Analogous to prior SHAPE studies (11, 13), we incorporated these DMS data into RNAstructure by transforming them into pseudoenergies, giving favorable energies or penalties depending on whether paired nucleotides were DMS-protected or reactive, respectively. We tested pseudo-energy frameworks based on both a previous ad hoc formula and an empirically derived statistical potential (inspired by techniques in 3D structure prediction; see Supporting Methods and Figure S1). The two methods gave consistent secondary structures. Because primer extension primarily reads out DMS reactivity at adenosines and cytosines, we excluded reactivities at other bases when performing structure modeling. DMS-guided modeling of the six ncRNAs gave FNR of 9.5% and FDR of 11.6% (Table 1 and Figure 1, see also Table S2), more than three-fold better than without the data. These error rates are lower than those previously achieved by SHAPE-directed modeling [FNR: 17%; FDR: 21% on the same RNAs (13)]. Furthermore, the DMSguided FNR and FDR values are equal to and lower, respectively, than values for SHAPE-based measurements in which primer extension was carried out without deoxyinosine triphosphate (FNR: 9.6%, FDR: 13.6%) to avoid known artefacts (13). We were surprised that DMS mapping gave similar or better information content, compared to SHAPE data, as the latter provides reactivities at approximately twice the number of nucleotides per RNA. Indeed, restricting the algorithm to use SHAPE data at adenines and cytosines or guanines and uracils gave worse models (see Supporting Table S3). Instead, an explanation derives from distinct SHAPE and DMS signatures at nucleotides that are not in Watson-Crick secondary structure but nevertheless form non-canonical interactions (see, e.g., A37 in the F. nucleatum glycine riboswitch; Supporting Fig. 2A). These nucleotides appear protected from the SHAPE reaction and thus receive pseudo-energies that incorrectly reward their pairings inside Watson-Crick secondary structure. However, these same nucleotides can expose their Watson-Crick edges to solvent and react strongly with DMS, signifying that they are outside Watson-Crick helices. The DMS-guided modeling can thus return the correct secondary structure in regions where the SHAPE data cannot distinguish Watson-Crick from non-Watson-Crick base pairs (compare Supporting Figs. 2B and 3C). FIGURE 1. Pseudo-energy-guided secondary structure models using DMS data on 6 non-coding RNAs. DMS data and secondary structure models for E. coli tRNA, the P4-P6 domain of the Tetrahymena group I ribozyme, E. coli 5S rRNA, the V. vulnificus add adenine riboswitch, V. cholerae cyclic di-GMP riboswitch, and F. nucleatum glycine riboswitch. Missed base pairs are highlighted in blue lines; mis-predicted base pairs are indicated by orange lines. Helix bootstrap confidence values are shown in red. FIGURE 2. Predictive power of DMS and SHAPE. Reactivity histograms for DMS (A) and SHAPE (B). Receiver operating characteristic curves for predicting unpaired nucleotides given a reactivity threshold. Area under the curve (AUC) for DMS is 0.86, for SHAPE, 0.83. Table 1: Performance of free energy minimization guided by reactivity-derived pseudo-energies from DMS and SHAPE chemical modifications. Total No data DMS SHAPE DMS + SHAPE TP FP TP FP TP FP TP FP tRNA 4 2 3 4 0 4 0 4 0 adenine rbs. 3 2 3 3 1 3 1 3 1 cdGMP rbs. 8 6 2 6 0 8 0 8 0 5S rRNA 7 1 9 6 3 6 3 6 3 P4-P6 RNA 11 10 1 10 1 9 1 9 1 glycine rbs. 9 5 3 9 0 8 0 9 0 Total 42 26 21 38 6 38 5 39 5 FNR 38.1% 9.5% 9.5% 7.1% FDR 44.7% 11.6% 13.6% 11.4% Sensitivity 61.9% 90.5% 90.5% 92.9% PPV 55.3% 88.4% 86.4% 88.6% Abreviations: TP, true positives; FP false positives; Cryst., number of helices in crystallographic model; FNR, False negative rate = 1 – TP/Cryst.; FDR, False discovery rate = FP/(TP + FP); Sensitivity = (1 – FNR); PPV, Positive predictive value = (1 – FDR) Reactivity histograms (Figure 2A and 2B) further support the enhanced predictive power of DMS vis-à-vis SHAPE. DMS mapping better distinguishes between nucleotides inside Watson-Crick helices and nucleotides outside helices (see also receiver operating characteristic curve; Figure 2C.) Like SHAPE-guided modeling, DMS-directed structure inference still produces errors (Table 1), e.g., for the central junction of the 5S rRNA (Supporting Fig. 2E and 2F). Some of these errors may be resolved through better incorporation of the DMS-derived pseudoenergies at, e.g., ‘singlet’ base pairs (Supporting Fig. 2E). Nevertheless, as with SHAPE modeling, these erroneous regions can be pinpointed by estimating helixby-helix confidence values through non-parametric boostrapping [Supplemental Methods and ref. (13); see also Supporting Figure S3]. For example, this procedure gives high confidence (≥ 90%) at almost all helices in the glycine riboswitch but low confidence values (<50 %) throughout the 5S rRNA DMS model (Figure 1). For many applications, DMS and SHAPE measurements can be carried out in parallel, so we sought to determine if their combination might improve automated secondary structure inference. Application of both sets of pseudoenergies gave a slight improvement in the algorithm’s accuracy (FNR of 7.1% and FDR of 11.4% ). In addition, we performed measurements with a reagent that primarily modifies Waston/Crick edges of guanosine and uracil, 1-cyclohexyl-(2morpholinoethyl) carbodiimide metho-p-toluene sulfonate (CMCT) (22). Incorporation of these data into RNAstructure gave poorer accuracy modeling than the DMSor SHAPEguided modeling above (FNR of 14.3%, FDR or 18.2%; see Supporting Table S4), consistent with weaker discrimination between paired/unpaired residues (Supporting Figure S1); and integrating CMCT with DMS and/or SHAPE data did not improve accuracy (Supplemental Table S2). The benchmark results presented herein establish that chemical mapping with dimethyl sulfate (DMS) can achieve prediction accuracies comparable to the SHAPE protocol using pseudo-energies to guide free energy minimization. DMS has been extensively used both in vitro and in vivo, for time-resolved RNA folding, precise thermodynamic analysis, and mapping RNA/protein interfaces (9, 12, 14, 15, 22). Sophisticated techniques for optimizing the reaction rate and its quenching have been developed (9, 23). Applying automated structure modeling, as demonstrated herein, will enable researchers to better take advantage of this large body of previous work. Furthermore, future studies may find it advantageous to perform both DMS and SHAPE approaches in parallel. Along with bootstrapping (13), comparison of separate DMS-guided vs. SHAPE-guided secondary structure models will permit rapid assessment of systematic errors and thus provide more accurate inferences. ACKNOWLEDGEMENT We thank authors of RNAstructure for making the source code freely available and members of the Das lab for comments on the manuscript. SUPPORTING INFORMATION Supporting methods, figures, and model accuracy tables are available free of charge at http://pubs.acs.org. REFERENCES 1. Black, D. L., and Pinto, A. L. (1989) U5 small nuclear ribonucleoprotein: RNA structure analysis and ATP-dependent interaction with U4/U6., Molecular and cellular biology 9, 3350-9. 2. Moazed, D., and Noller, H. F. (1991) Sites of interaction of the CCA end of peptidyl-tRNA with 23S rRNA., Proceedings of the National Academy of Sciences of the United States of America 88, 3725-8. 3. Mitra, S., Shcherbakova, I. V., Altman, R. B., Brenowitz, M., and Laederach, A. (2008) High-throughput single-nucleotide structural mapping by capillary automated footprinting analysis., Nucleic acids research. Oxford University Press 36, e63. 4. Yoon, S., Kim, J., Hum, J., Kim, H., Park, S., Kladwang, W., and Das, R. (2011) HiTRACE: high-throughput robust analysis for capillary electrophoresis, Bioinformatics 27, 1798-1805. 5. Kladwang, W., Cordero, P., and Das, R. (2011) A mutate-andmap strategy accurately infers the base pairs of a 35-nucleotide model RNA, RNA 17, 522-534. 6. Wilkinson, K. A., Merino, E. J., and Weeks, K. M. (2006) Selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution., Nature protocols. Nature Publishing Group 1, 1610-6. 7. Lucks, J. B., Mortimer, S. A., Trapnell, C., Luo, S., Aviran, S., Schroth, G. P., Pachter, L., Doudna, J. A., and Arkin, A. P. (2011) Multiplexed RNA structure characterization with selective 2’-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq)., Proceedings of the National Academy of Sciences of the United States of America (Hage, J., and Meeus, M., Eds.). National Academy of Sciences 108, 11063-11068. 8. Peattie, D. A., and Gilbert, W. (1980) Chemical probes for higher-order structure in RNA., Proceedings of the National Academy of Sciences of the United States of America. National Acad Sciences 77, 4679-4682. 9. Tijerina, P., Mohr, S., and Russell, R. (2007) DMS footprinting of structured RNAs and RNA-protein complexes., Nature Protocols 2, 2608-2623. 10. Mathews, D. H., Disney, M. D., Childs, J. L., Schroeder, S. J., Zuker, M., and Turner, D. H. (2004) Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure., Proceedings of the National Academy of Sciences of the United States of America. National Academy of Sciences 101, 7287-92. 11. Deigan, K. E., Li, T. W., Mathews, D. H., and Weeks, K. M. (2009) Accurate SHAPE-directed RNA structure determination., Proceedings of the National Academy of Sciences of the United States of America. National Academy of Sciences 106, 97-102. 12. Leontis, N. B., and Westhof, E. (1998) The 5S rRNA loop E: Chemical probing and phylogenetic data versus crystal structure, RNA 4, 1134-1153. 13. Kladwang, W., VanLang, C. C., Cordero, P., and Das, R. (2011) Understanding the errors of SHAPE-directed RNA structure modeling., Biochemistry. American Chemical Society 50, 8049-56. 14. Lempereur, L., Nicoloso, M., Riehl, N., Ehresmann, C., Ehresmann, B., and Bachellerie, J. P. (1985) Conformation of yeast 18S rRNA. Direct chemical probing of the 5′ domain in ribosomal subunits and in deproteinized RNA by reverse transcriptase mapping of dimethyl sulfate-accessible sites, Nucleic Acids Research. Oxford Univ Press 13, 8339. 15. Wells, S. E., Hughes, J. M., Igel, A. H., and Ares, M. (2000) Use of dimethyl sulfate to probe RNA structure in vivo., Methods in Enzymology. Academic Press 318, 479-493. 16. Byrne, R. T., Konevega, A. L., Rodnina, M. V., and Antson, A. A. (2010) The crystal structure of unmodified tRNAPhe from Escherichia coli, Nucleic Acids Research. Oxford University Press 38, 4154-4162. 17. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Kundrot, C. E., Cech, T. R., and Doudna, J. A. (1996) Crystal Structure of a Group I Ribozyme Domain: Principles of RNA Packing, Science 273, 1678-1685. 18. Serganov, A., Yuan, Y.-R., Pikovskaya, O., Polonskaia, A., Malinina, L., Phan, A. T., Hobartner, C., Micura, R., Breaker, R. R., and Patel, D. J. (2004) Structural basis for discriminative regulation of gene expression by adenineand guanine-sensing mRNAs., Chemistry & biology 11, 1729-41. 19. Smith, K. D., Lipchock, S. V., Livingston, A. L., Shanahan, C. A., and Strobel, S. A. (2010) Structural and Biochemical Determinants of Ligand Binding by the c-di-GMP Riboswitch,, Biochemistry. American Chemical Society 49, 7351-7359. 20. Butler, E. B., Xiong, Y., Wang, J., and Strobel, S. A. (2011) Structural basis of cooperative ligand binding by the glycine riboswitch., Chemistry & biology 18, 293-8. 21. Kladwang, W., VanLang, C. C., Cordero, P., and Das, R. (2011) A two-dimensional mutate-and-map strategy for non-coding RNA structure., Nature chemistry. Nature Publishing Group 3, 954-62. 22. Planning, S. (2000) Probing RNA Structure with Chemical, Current protocols in nucleic acid chemistry edited by Serge L Beaucage et al Chapter 6, 1-21. 23. Das, R., Karanicolas, J., and Baker, D. (2010) Atomic accuracy in predicting and designing noncanonical RNA structure., Nature methods. Nature Publishing Group 7, 291-4. Table of
منابع مشابه
Quantitative dimethyl sulfate mapping for automated RNA secondary structure inference.
For decades, dimethyl sulfate (DMS) mapping has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using an energy minimization framework developed for 2'-OH acylation (SHAPE) mapping. On six noncoding RNAs with crystallographic models, DMS-guided modeling achieves overall false negative and false discovery ...
متن کاملRNA structure inference through chemical mapping after accidental or intentional mutations.
Despite the critical roles RNA structures play in regulating gene expression, sequencing-based methods for experimentally determining RNA base pairs have remained inaccurate. Here, we describe a multidimensional chemical-mapping method called "mutate-and-map read out through next-generation sequencing" (M2-seq) that takes advantage of sparsely mutated nucleotides to induce structural perturbati...
متن کاملStandardization of RNA Chemical Mapping Experiments
Chemical mapping experiments offer powerful information about RNA structure but currently involve ad hoc assumptions in data processing. We show that simple dilutions, referencing standards (GAGUA hairpins), and HiTRACE/MAPseeker analysis allow rigorous overmodification correction, background subtraction, and normalization for electrophoretic data and a ligation bias correction needed for accur...
متن کاملA mutate-and-map strategy accurately infers the base pairs of a 35-nucleotide model RNA.
We present a rapid experimental strategy for inferring base pairs in structured RNAs via an information-rich extension of classic chemical mapping approaches. The mutate-and-map method, previously applied to a DNA/RNA helix, systematically searches for single mutations that enhance the chemical accessibility of base-pairing partners distant in sequence. To test this strategy for structured RNAs...
متن کاملHigh-throughput mutate-map-rescue evaluates SHAPE-directed RNA structure and uncovers excited states.
The three-dimensional conformations of noncoding RNAs underpin their biochemical functions but have largely eluded experimental characterization. Here, we report that integrating a classic mutation/rescue strategy with high-throughput chemical mapping enables rapid RNA structure inference with unusually strong validation. We revisit a 16S rRNA domain for which SHAPE (selective 2'-hydroxyl acyla...
متن کامل